Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@marcio-diaz
Copy link
Contributor

@marcio-diaz marcio-diaz commented Sep 22, 2019

Fixes #3592

Summary

This PR removes some inefficiencies of tree_route and calling functions. Previously, tree_route was computing the path between two blocks by loading headers from database. Now, we cache the information needed in an LRU cache in-memory.
Also, many times we were using tree_route just to check if a block is descendant of another one. Instead, this PR introduces a function lowest_common_ancestor that compute descendants in O(1) (most of the time).

As a side effect, this improves syncing speed for Kusama from ~2 hours to ~20 minutes (25 bps -> 35 bps in wasm, 80 bps -> 400 bps in native).

Changes:

  • Introduce a CachedHeaderMetadata with fields hash, number, parent and ancestor. The ancestor field is used to quickly jump through the tree.
  • We cache the last 20_000 HeaderMetadata used, by means of a LRU cache. The optimal number depends on the difference between best and finalized block numbers.
  • Use CachedHeaderMetadata to traverse the tree in the tree_route function. Previously we were loading the whole headers from DB to do this task.
  • Implement a lowest_common_ancestor function and use it in is_descendent_of and other places, replacing tree_route. Right now, the query pattern of this function is as follows: lca(best, final), lca(best+1, final), lca(best+2, final), ..., lca(best+5000, final) . By using the ancestor field of LightHeader, the first query is O(n) and the followings O(1).

Notes:

  • I also tried implementing lowest_common_ancestor by dividing the tree in sections and jumping between them, getting O(sqrt(h)) for query, but it didn't work very well in practice.

@marcio-diaz marcio-diaz added the A0-please_review Pull request needs code review. label Sep 22, 2019
@marcio-diaz marcio-diaz requested a review from arkpar September 22, 2019 08:43
@marcio-diaz marcio-diaz changed the title Optimize tree route Optimize tree route to sync faster Sep 23, 2019
@rphmeier
Copy link
Contributor

I'm curious as to what other approaches you have tried. It seems much more memory and computation light to cache the results of queries to is_descendent_of themselves. My thesis:

  • is_descendent_of queries will be common for short chains near the HEAD
  • queries deeper into the chain will tend to focus on some very specific blocks

The memory cache approach is clearly faster than doing nothing, but it requires keeping around a big LRU cache

@marcio-diaz
Copy link
Contributor Author

I'm curious as to what other approaches you have tried. It seems much more memory and computation light to cache the results of queries to is_descendent_of themselves. My thesis:

* `is_descendent_of` queries will be common for short chains near the HEAD

* queries deeper into the chain will tend to focus on some very specific blocks

The memory cache approach is clearly faster than doing nothing, but it requires keeping around a big LRU cache

It seems we are making many queries from imported block to finalized block, as I commented in the description, the cache is to keep this window. While syncing Kusama I have seen max differences of about 10k blocks. Probably we can reduce the cache size from 20k to 10k (or even more) and keep similar performance. I also thought about disabling it after main sync.

@arkpar
Copy link
Member

arkpar commented Sep 24, 2019

Right now, the query pattern of this function is as follows: lca(best, final), lca(best+1, final), lca(best+2, final), ..., lca(best+5000, final)

I wonder where is this pattern coming from. I imagine we only need to check tree paths or ancestry w.r.t the final block only when finalizing, and not when importing or checking each block.

Does GRANDPA really need to check if each new block is descendent from the last finalized block?
@andresilva @rphmeier

@rphmeier
Copy link
Contributor

Does GRANDPA really need to check if each new block is descendent from the last finalized block?
@andresilva @rphmeier

With any kind of finality, we always have to do this check. At the very least, if that block could be a new "best" block.

@andresilva
Copy link
Contributor

Some explanation of where tree_routes are being calculated regarding consensus:

  • GRANDPA - we track pending authority set changes in a tree, we calculate tree_routes when making queries on this tree, either when importing a block if it includes a consensus change digest, or when finalizing a block to check if finalizing the given block finalizes anything in the tree.

  • BABE - epoch changes are announced one epoch in advance and we also track them in a tree. Whenever we import a block that signals an epoch change we must import it into the tree, and we also query for epoch that just started (previously announced). Some changes introduced in Fixing BABE epochs to change between blocks #3583 changed some of this logic, but I think now it should cost less as we minimize the number of calls to is_descendent_of when querying the tree for pending epochs.

Nodes in the tree have an implicit ancestry relationship that is given by is_descendent_of (which uses tree_route internally).

I doubt that on Kusama the issue is being caused by GRANDPA since there are no authority changes happening. Most likely it was being caused by BABE (although that should have been improved by #3583).

Regardless of any improvements on BABE having a way to speed up common ancestry queries makes sense to me.

@rphmeier
Copy link
Contributor

rphmeier commented Sep 24, 2019

In #3586 we do is_descendent_of calls for every block, since I removed the caching that it previously used to figure out the epoch fora block. I think a really small LRU-cache to track which blocks we recently queried epoch for (so we can see if the new block would be in the same epoch as parent) would kill 95% of those queries.

Copy link
Contributor

@Demi-Marie Demi-Marie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although I am no expert on the database.

@marcio-diaz marcio-diaz added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Sep 26, 2019
@marcio-diaz marcio-diaz force-pushed the marcio/optimize-tree-route branch from 11af594 to 03b1c77 Compare September 27, 2019 08:25
@marcio-diaz marcio-diaz added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Sep 27, 2019
@marcio-diaz
Copy link
Contributor Author

marcio-diaz commented Sep 27, 2019

This is ready for another look, I introduced the crate and traits as @rphmeier suggested (although I'm not sure about the location and name).

Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, minor nits.

@marcio-diaz marcio-diaz force-pushed the marcio/optimize-tree-route branch from 6f2e7d8 to 8cb5b42 Compare October 2, 2019 17:14
@marcio-diaz marcio-diaz added A8-looksgood and removed A0-please_review Pull request needs code review. labels Oct 2, 2019
@marcio-diaz marcio-diaz merged commit d7be290 into master Oct 2, 2019
@marcio-diaz marcio-diaz deleted the marcio/optimize-tree-route branch October 2, 2019 18:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize tree_route and related, avoid read_header

7 participants